Sorting, Searching, and Simulation in the MapReduce Framework

نویسندگان

  • Michael T. Goodrich
  • Nodari Sitchinava
  • Qin Zhang
چکیده

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation problems. This study is motivated by a goal of ultimately putting the MapReduce framework on an equal theoretical footing with the well-known PRAM and BSP parallel models, which would benefit both the theory and practice of MapReduce algorithms. We describe efficient MapReduce algorithms for sorting, multi-searching, and simulations of parallel algorithms specified in the BSP and CRCW PRAM models. We also provide some applications of these results to problems in parallel computational geometry for the MapReduce framework, which result in efficient MapReduce algorithms for sorting, 2and 3-dimensional convex hulls, and fixed-dimensional linear programming. For the case when mappers and reducers have a memory/message-I/O size of M = Θ(N ), for a small constant > 0, all of our MapReduce algorithms for these applications run in a constant number of rounds. ar X iv :1 10 1. 19 02 v1 [ cs .D C ] 1 0 Ja n 20 11

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MapReduce: Distributed Computing for Machine Learning

We use Hadoop, an open-source implementation of Google’s distributed file system and the MapReduce framework for distributed data processing, on modestly-sized compute clusters to evaluate its efficacy for standard machine learning tasks. We show benchmark performance on searching and sorting tasks to investigate the effects of various system configurations. We also distinguish classes of machi...

متن کامل

Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry

In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the MapReduce framework, which result in efficient MapReduce algorithms for sorting, 1-dimensional all nearest-neighbors, 2-dimensional convex hulls, 3-dimensional ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Optimization and analysis of large scale data sorting algorithm based on Hadoop

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of big data sorting is to use shuffle and sort phase in MapReduce based on Hadoop. However, if we use it directly, the efficiency could be very low and the loa...

متن کامل

A Study of Many-Core Hardware Accelerated Hadoop MapReduce

MapReduce is a widely used framework for massive data processing. It was originally designed to overcome the I/O bottleneck, and enabled us to process Bigdata with the commodity clusters systems. However, several existing work have recently shown that the emerging high speed storage and network devices are capable to remove the I/O bottleneck and made the CPU the next serious bottleneck in the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011